52 research outputs found

    “How Short is a Piece of String?”: An Investigation into the Impact of Text Length on Short-Text Classification Accuracy

    Get PDF
    The recent increase in the widespread use of short messages, for example micro-blogs or SMS communications, has created an opportunity to harvest a vast amount of information through machine-based classification. However, traditional classification methods have failed to produce accuracies comparable to those obtained from similar classification of longer texts. Several approaches have been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data enrichment sources, ranging from thesauri and semantic nets such as Wordnet, to pre-built online taxonomies such as Wikipedia. Other avenues of investigation have used more formal extensions such as Latent Semantic Analysis (LSA) to extend or replace the more basic, traditional, methods better suited to classification of longer texts. This work examines the changes in classification accuracy of a small selection of classification methods using a variety of enhancement methods, as target text length decreases. The experimental data used is a corpus of micro-blog (twitter) posts obtained from the ‘Sentiment140’1 sentiment classification and analysis project run by Stanford University and described by Go, Bhayani and Huang (2009), which has been split into sub-corpora differentiated by text length

    How Short is a Piece of String?: the Impact of Text Length and Text Augmentation on Short-text Classification Accuracy

    Get PDF
    Recent increases in the use and availability of short messages have created opportunities to harvest vast amounts of information through machine-based classification. However, traditional classification methods have failed to yield accuracies comparable to classification accuracies on longer texts. Several approaches have previously been employed to extend traditional methods to overcome this problem, including the enhancement of the original texts through the construction of associations with external data supplementation sources. Existing literature does not precisely describe the impact of text length on classification performance. This work quantitatively examines the changes in accuracy of a small selection of classifiers using a variety of enhancement methods, as text length progressively decreases. Findings, based on ANOVA testing at a 95% confidence interval, suggest that the performance of classifiers using simple enhancements decreases with decreasing text length, but that the use of more sophisticated enhancements risks over-supplementation of the text and consequent concept drift and classification performance decrease as text length increases

    Improving identification & management of familial hypercholesterolaemia in primary care: pre- and post-intervention study

    Get PDF
    Background and Aims: Familial hypercholesterolaemia (FH) is a major cause of premature heart disease but remains unrecognised in most patients. This study investigated if a systematic primary care-based approach to identify and manage possible FH improves recommended best clinical practice. Methods: Pre- and post-intervention study in six UK general practices (population 45,033) which invited patients with total cholesterol >7.5 mmol/L to be assessed for possible FH. Compliance with national guideline recommendations to identify and manage possible FH (repeat cholesterol; assess family history of heart disease; identify secondary causes and clinical features; reduce total & LDL-cholesterol; statin prescribing; lifestyle advice) was assessed by calculating the absolute difference in measures of care pre- and six months post-intervention. Results: The intervention improved best clinical practice in 118 patients consenting to assessment (of 831 eligible patients): repeat cholesterol test (+75.4%, 95% CI 66.9-82.3); family history of heart disease assessed (+35.6%, 95% CI 27.0-44.2); diagnosis of secondary causes (+7.7%, 95% CI 4.1-13.9), examining clinical features (+6.0%, 95% CI 2.9-11.7). For 32 patients diagnosed with possible FH using Simon-Broome criteria, statin prescribing significantly improved (18.8%, 95% CI 8.9-35.3) with non-significant mean reductions in cholesterol post-intervention (total: -0.16 mmol/L, 95% CI -0.78-0.46; LDL: -0.12 mmol/L, 95% CI -0.81-0.57). Conclusions: Within six months, this simple primary care intervention improved both identification and management of patients with possible FH, in line with national evidence-based guidelines. Replicating and sustaining this approach across the country could lead to substantial improvement in health outcomes for these individuals with very high cardiovascular risk

    Hydrographic changes in the eastern subpolar North Atlantic during the last deglaciation

    Get PDF
    Author Posting. © The Author(s), 2010. This is the author's version of the work. It is posted here by permission of Elsevier B.V. for personal use, not for redistribution. The definitive version was published in Quaternary Science Reviews 29 (2010): 3336-3345, doi:10.1016/j.quascirev.2010.08.013.Millennial-scale climate fluctuations of the last deglaciation have been tied to abrupt changes in the Atlantic Meridional Overturning Circulation (MOC). A key to understanding mechanisms of MOC collapse and recovery is the documentation of upper ocean hydrographic changes in the vicinity of North Atlantic deep convection sites. Here we present new high-resolution ocean temperature and δ18Osw records spanning the last deglaciation from an eastern subpolar North Atlantic site that lies along the flow path of the North Atlantic Current, approaching deep convection sites in the Labrador and Greenland-Iceland-Norwegian (GIN) Seas. High-resolution temperature and δ18Osw records from subpolar Site 980 help track the movement of the subpolar/subtropical front associated with temperature and Atlantic MOC changes throughout the last deglaciation. Distinct δ18Osw minima during Heinrich-1 (H1) and the Younger Dryas (YD) correspond with peaks in ice-rafted debris and periods of reduced Atlantic MOC, indicating the presence of melt water in this region that could have contributed to MOC reductions during these intervals. Increased tropical and subtropical δ18Osw during these periods of apparent freshening in the subpolar North Atlantic suggest a buildup of salt at low latitudes that served as a negative feedback on reduced Atlantic MOC.Support for this research was provided by the U.S. National Science Foundation (JFM and DWO) and a postdoctoral scholarship funded in part by the Gary Comer Science and Education Foundation (HB)

    NJOY21: Next generation nuclear data processing capabilities

    No full text
    NJOY is a well respected code for nuclear data processing throughout the world. It was first publicly released in 1977 as a successor to MINX and has continuously improved its capabilities ever since. The latest release of NJOY is NJOY2012 and was released in December 2012 with its latest update coming in February 2015. A new effort has begun at Los Alamos National Laboratory to ensure that NJOY remains a useful nuclear data processing code for the next generation of data processing needs. The result of this effort will be NJOY21, a new code for processing nuclear data and interacting with a variety of nuclear data files. Much has changed in the nuclear data world since NJOY was first released. Perhaps the biggest change is the increase in the amount of data—both in the number of available materials and the richness of the data for each material. While more and better nuclear data greatly improves the quality of simulations and calculations that rely on that data, it creates significant challenges for the individual who processes and verifies the nuclear data. NJOY2012 is well vetted and capable, but when processing many files/materials, it is cumbersome and slow. NJOY21 will build on the success of many previous major releases of NJOY made during the previous four decades. In addition, NJOY21 will facilitate the processing, verifying, and validating of many nuclear data files

    NJOY21: Next generation nuclear data processing capabilities

    No full text
    NJOY is a well respected code for nuclear data processing throughout the world. It was first publicly released in 1977 as a successor to MINX and has continuously improved its capabilities ever since. The latest release of NJOY is NJOY2012 and was released in December 2012 with its latest update coming in February 2015. A new effort has begun at Los Alamos National Laboratory to ensure that NJOY remains a useful nuclear data processing code for the next generation of data processing needs. The result of this effort will be NJOY21, a new code for processing nuclear data and interacting with a variety of nuclear data files. Much has changed in the nuclear data world since NJOY was first released. Perhaps the biggest change is the increase in the amount of data—both in the number of available materials and the richness of the data for each material. While more and better nuclear data greatly improves the quality of simulations and calculations that rely on that data, it creates significant challenges for the individual who processes and verifies the nuclear data. NJOY2012 is well vetted and capable, but when processing many files/materials, it is cumbersome and slow. NJOY21 will build on the success of many previous major releases of NJOY made during the previous four decades. In addition, NJOY21 will facilitate the processing, verifying, and validating of many nuclear data files

    Glacial/interglacial instabilities of the Western Boundary Under Current during the last 365 kyr from Sm/Nd ratios of the sedimentary clay-size fractions at ODP site 646 (Labrador Sea)

    Full text link
    We present 40 Sm-Nd isotope measurements of the clay-size (< 2 mu m) fractions of sediments from the Southern Greenland rise (ODP-646) that span the last 365 kyr. These data track changes in the relative supply of fine particles carried into the deep Labrador Sea by the Western Boundary Under Current (WBUC) back to the fourth glacial-interglacial cycles. Earlier studies revealed three general sources of particles to the core site: (i) Precambrian crustal material from Canada, Greenland, and/or Scandinavia (North American Shield-NAS), (ii) Palaeozoic or younger crustal material from East Greenland, NW Europe, and/or western Scandinavia (Young Crust-YQ and (iii) volcanic material from Iceland and the Mid-Atlantic Ridge (MAR). Clay-size fractions from glacial sediments have the lowest Nd isotopic ratios. Supplies of young crustal particles were similar during glacial oxygen isotope stages (OIS) 2, 6, and 10. In contrast the mean volcanic contributions decreased relative to old craton material from OIS 10 to OIS 6 and then from OIS 6 to OIS 2. The glacial OIS 8 interval displays a mean Sm/Nd ratio similar to those of interglacials OIS 1, 5, and 9. Compared with other interglacials, OIS 7 was marked by a higher YC contribution but a similar similar to 30% MAR supply. The overall NAS contribution dropped by a factor of 2 during each glacial/interglacial transition, with the MAR contribution broadly replacing it during interglacials. To decipher between higher supplies and/or dilution, particle fluxes from each end member were estimated. Glacial NAS fluxes were systematically higher than interglacial fluxes. During the time interval examined, fine particle supplies to the Labrador Sea were strongly controlled by proximal ice-margin erosion and thus echoed the glacial stage intensity. In contrast, the WBUC-carried MAR supplies from the eastern basins did not change significantly throughout the last 365 kyr, except for a marked increase in surface-sediments that suggests unique modem conditions. Distal WBUC-controlled inputs from the Northern and NE North Atlantic seem to have been less variable than proximal supplies linked with glacial erosion rate. (c) 2006 Elsevier B.V All rights reserved
    corecore